242 research outputs found

    Inter-workgroup barrier synchronisation on graphics processing units

    Get PDF
    GPUs are parallel devices that are able to run thousands of independent threads concurrently. Traditional GPU programs are data-parallel, requiring little to no communication, i.e. synchronisation, between threads. However, classical concurrency in the context of CPUs often exploits synchronisation idioms that are not supported on GPUs. By studying such idioms on GPUs, with an aim to facilitate them in a portable way, a wider and more generic space of GPU applications can be made possible. While the breadth of this thesis extends to many aspects of GPU systems, the common thread throughout is the global barrier: an execution barrier that synchronises all threads executing a GPU application. The idea of such a barrier might seem straightforward, however this investigation reveals many challenges and insights. In particular, this thesis includes the following studies: Execution models: while a general global barrier can deadlock due to starvation on GPUs, it is shown that the scheduling guarantees of current GPUs can be used to dynamically create an execution environment that allows for a safe and portable global barrier across a subset of the GPU threads. Application optimisations: a set GPU optimisations are examined that are tailored for graph applications, including one optimisation enabled by the global barrier. It is shown that these optimisations can provided substantial performance improvements, e.g. the barrier optimisation achieves over a 10X speedup on AMD and Intel GPUs. The performance portability of these optimisations is investigated, as their utility varies across input, application, and architecture. Multitasking: because many GPUs do not support preemption, long-running GPU compute tasks (e.g. applications that use the global barrier) may block other GPU functions, including graphics. A simple cooperative multitasking scheme is proposed that allows graphics tasks to meet their deadlines with reasonable overheads.Open Acces

    Master of Science

    Get PDF
    thesisGraphics Processing Units (GPUs) are highly parallel shared memory microprocessors, and as such, they are prone to the same concurrency considerations as their traditional multicore CPU counterparts. In this thesis, we consider shared memory consistency, i.e. what values can be read when issued concurrently with writes on current GPU hardware. While memory consistency has been relatively well studied for CPUs, GPUs present substantially different concurrency systems with an explicit thread and memory hierarchy. Because documentation on GPU memory models is limited, it remains unclear what behaviors are allowed by current GPU implementations. To this end, this work focuses on testing shared memory consistency behavior on NVIDIA GPUs. We present a format for describing GPU memory consistency tests (dubbed GPU litmus tests) which includes the placement of testing threads into the GPU thread hierarchy (e.g. cooperative thread arrays, warps) and memory locations into GPU memory regions (e.g. shared, global). We then present a framework for running GPU litmus tests under system stress designed to trigger weak memory model behaviors, that is, executions that do not correspond to an interleaving of the instructions of the concurrent program. We discuss GPU specific incantations (i.e. heuristics) which we found to be crucial for observing weak memory model executions; these include bank conflicts and custom GPU memory stressing functions. We then report the results of running GPU litmus tests in this framework and show that we observe a controversial relaxed coherence behavior on older NVIDIA chips. We present several examples of published GPU applications which may exhibit unintended behavior due to the lack of fence synchronization; one such example is a spin-lock published in the popular CUDA by Example book. We then test several families of tests and compare our results to a proposed operational GPU memory model and show that the model is unsound (i.e. disallows behaviors that we observe on hardware). Our techniques are implemented in a modified version of a memory model testing tool named litmus

    Changing Campustown

    Get PDF
    Mickey’s Irish Pub was not filled with the usual smell of stale beer and the slurred pick-up lines. Instead, its dwellers asked questions and raised concerns about the project LANE4 Property Management and the City of Ames plans to wreak on Campustown in as little as a year. Business owners crowded onto the sticky floors of the popular bar on Welch Avenue in hopes to get their questions answered and an understanding on where the LANE4 wrecking ball would be making its impression. Tim Schrum, general manager of Mickey’s, organized the meeting March 3 so Campustown business and property owners could ask questions about the future of their businesses, and they heard advice from a lawyer who was present

    Exposing errors related to weak memory in GPU applications

    Get PDF
    © 2016 ACM.We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUS spanning three NVIDIA architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy

    Portable Inter-workgroup Barrier Synchronisation for GPUs

    Get PDF
    Despite the growing popularity of GPGPU programming, there is not yet a portable and formally-specified barrier that one can use to synchronise across workgroups. Moreover, the occupancy-bound execution model of GPUs breaks assumptions inherent in traditional software execution barriers, exposing them to deadlock. We present an occupancy discovery protocol that dynamically discovers a safe estimate of the occupancy for a given GPU and kernel, allowing for a starvation-free (and hence, deadlock-free) inter-workgroup barrier by restricting the number of workgroups according to this estimate. We implement this idea by adapting an existing, previously non-portable, GPU inter-workgroup barrier to use OpenCL 2.0 atomic operations, and prove that the barrier meets its natural specification in terms of synchronisation. We assess the portability of our approach over eight GPUs spanning four vendors, comparing the performance of our method against alternative methods. Our key findings include: (1) the recall of our discovery protocol is nearly 100%; (2) runtime comparisons vary substantially across GPUs and applications; and (3) our method provides portable and safe inter-workgroup synchronisation across the applications we study

    Women and Illegal Activities: Gender Differences and Women's Willingness to Comply Over Time

    Get PDF
    In recent years the topics of illegal activities such as corruption or tax evasion have attracted a great deal of attention. However, there is still a lack of substantial empirical evidence about the determinants of compliance. The aim of this paper is to investigate empirically whether women are more willing to be compliant than men and whether we observe (among women and in general) differences in attitudes among similar age groups in different time periods (cohort effect) or changing attitudes of the same cohorts over time (age effect) using data from eight Western European countries from the World Values Survey and the European Values Survey that span the period from 1981 to 1999. The results reveal higher willingness to comply among women and an age rather than a cohort effect. Working Paper 06-5

    Chemical cues and pheromones in the sea lamprey (Petromyzon marinus)

    Get PDF
    Chemical cues and pheromones guide decisions in organisms throughout the animal kingdom. The neurobiology, function, and evolution of olfaction are particularly well described in insects, and resulting concepts have driven novel approaches to pest control. However, aside from several exceptions, the olfactory biology of vertebrates remains poorly understood. One exception is the sea lamprey (Petromyzon marinus), which relies heavily upon olfaction during reproduction. Here, we provide a broad review of the chemical cues and pheromones used by the sea lamprey during reproduction, including overviews of the sea lamprey olfactory system, chemical cues and pheromones, and potential applications to population management. The critical role of olfaction in mediating the sea lamprey life cycle is evident by a well-developed olfactory system. Sea lamprey use chemical cues and pheromones to identify productive spawning habitat, coordinate spawning behaviors, and avoid risk. Manipulation of olfactory biology offers opportunities for management of populations in the Laurentian Great Lakes, where the sea lamprey is a destructive invader. We suggest that the sea lamprey is a broadly useful organism with which to study vertebrate olfaction because of its simple but well-developed olfactory organ, the dominant role of olfaction in guiding behaviors during reproduction, and the direct implications for vertebrate pest management
    • …
    corecore